Search CORE

2,915 research outputs found

Semi-supervised latent variable models for sentence-level sentiment analysis

Author: McDonald Ryan
Täckström Oscar
Publication venue
Publication date: 01/01/2011
Field of study

We derive two variants of a semi-supervised model for fine-grained sentiment analysis. Both models leverage abundant natural supervision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment classifiers. The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. This allows for highly efficient estimation and inference algorithms with rich feature definitions. We describe the two variants as well as their component models and verify experimentally that both variants give significantly improved results for sentence-level sentiment analysis compared to all baselines

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

A Universal Part-of-Speech Tagset

Author: Das Dipanjan
McDonald Ryan
Petrov Slav
Publication venue
Publication date: 01/01/2011
Field of study

To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts-of-speech for 22 different languages. We highlight the use of this resource via two experiments, including one that reports competitive accuracies for unsupervised grammar induction without gold standard part-of-speech tags

arXiv.org e-Print Archive

CiteSeerX

Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure

Author: McDonald Ryan
Täckström Oscar
Uszkoreit Jakob
Publication venue
Publication date: 01/01/2012
Field of study

It has been established that incorporating word cluster features derived from large unlabeled corpora can significantly improve prediction of linguistic structure. While previous work has focused primarily on English, we extend these results to other languages along two dimensions. First, we show that these results hold true for a number of languages across families. Second, and more interestingly, we provide an algorithm for inducing cross-lingual clusters and we show that features derived from these clusters significantly improve the accuracy of cross-lingual structure prediction. Specifically, we show that by augmenting direct-transfer systems with cross-lingual cluster features, the relative error of delexicalized dependency parsers, trained on English treebanks and transferred to foreign languages, can be reduced by up to 13%. When applying the same method to direct transfer of named-entity recognizers, we observe relative improvements of up to 26%

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Cancer-related health behaviors and health service use among Inuit and other residents of Canada’s north

Author: James Ted McDonald
Ryan Trenholm
Publication venue
Publication date
Field of study

Objective – To identify the extent to which differences between Inuit and other residents of Canada’s North in a set of health behaviors and health service use related to cancer incidence and diagnosis can be accounted for by demographic, socio-economic and geographic factors. Study Design – Data on residents aged 21-65 who live in Canada’s North are drawn from the 2000-01 and 2004-05 Canadian Community Health Surveys and the 2001 Aboriginal People’s Survey. Methods – Multivariate Logistic regression analysis is applied to 1) a set of health behaviors including smoking, binge drinking and obesity, and 2) a set of basic health service use measures including consultations with a physician and with any medical professional, Pap smear testing and mammography. Results – Higher smoking and binge drinking rates and lower rates of female cancer screening among Inuit are not accounted for by differences in demographic characteristics, education, location of residence or distance from a hospital. Conclusions – Factors specific to Inuit individuals and communities may be contributing to negative health behaviors associated with increased cancer risk, and to a lower incidence of diagnostic cancer screening. Policy interventions to address these issues may need to be targeted specifically to Inuit Canadians.Inuit, aboriginal, cancer screening, smoking, health

Research Papers in Economics

A dynamic modelling environment for the evaluation of wide area protection systems

Author: Abdulhadi I.F.
Burt Graeme
Mcdonald James
Tumilty Ryan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2008
Field of study

This paper introduces the concept of dynamic modelling for wide area and adaptive power system protection. Although not limited to these types of protection schemes, these were chosen due to their potential role in solving a multitude of protection challenges facing future power systems. The dynamic modelling will be implemented using a bespoke simulation environment. This tool allows for a fully integrated testing methodology which enables the validation of protection solutions prior to their operational deployment. Furthermore the paper suggests a distributed protection architecture, which when applied to existing and future protection schemes, has the potential to enhance their functionality and avoid mal-operation given that safety and reliability of power systems are paramount. This architecture also provides a means to better understand the underlying dynamics of the aforementioned protection schemes and will be rigorously validated using the modelling environment

Crossref

University of Strathclyde Institutional Repository

PMKNS for PIE: Parsed Morphological KATR Networks of Sanskrit for Proto-Indo-European

Author: McDonald Ryan Mark
Publication venue: UKnowledge
Publication date: 01/01/2020
Field of study

In this thesis, I construct two computational networks for Sanskrit to test theories of nominal accentuation as a way of examining the simplicity of each theory. I will be examining the Paradigmatic Approach and the Compositional Approach to nominal accentuation. For the Paradigmatic Approach, nominals are categorized into mobile and static categories based on how the accent appears in the paradigm (Fortson 2010). For the Compositional Approach, accent mobility is a result of the combination of morphemes and their inherent accent states (Kirparsky 2010). To construct these networks, I use the KATR extension to the DATR language for lexical knowledge representation (Finkel et al. 2002). In Chapter 1, I give an overview of Proto-Indo-European (PIE) accentuation and KATR. Chapter 2 presents my methods and connects the hypothetical nature of PIE to the well-documented Indo-European (IE) language Sanskrit. In Chapters 3 and 4, I use a guided derivation of a Sanskrit r-stem nominal pitr̥- and a Sanskrit a-stem nominal sukha- to walk us through each step. Chapter 5 is an analysis of my results for the two networks from chapters 3 and 4 and then the overall conclusions I have drawn from the project and suggests further areas of expansion

University of Kentucky

Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging

Author: Das Dipanjan
McDonald Ryan
Nivre Joakim
Petrov Slav
Täckström Oscar
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2013
Field of study

We consider the construction of part-of-speech taggers for resource-poor languages. Recently, manually constructed tag dictionaries from Wiktionary and dictionaries projected via bitext have been used as type constraints to overcome the scarcity of annotated data in this setting. In this paper, we show that additional token constraints can be projected from a resource-rich source language to a resource-poor target language via word-aligned bitext. We present several models to this end; in particular a partially observed conditional random ﬁeld model, where coupled token and type constraints provide a partial signal for training. Averaged across eight previously studied Indo-European languages, our model achieves a 25% relative error reduction over the prior state of the art. We further present successful results on seven additional languages from different families, empirically demonstrating the applicability of coupled token and type constraints across a diverse set of languages

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database